Decision Boundary Partitioning: Variable Resolution Model-Free Reinforcement Learning
نویسنده
چکیده
Reinforcement learning agents attempt to learn and construct a decision policy which maximises some reward signal. In turn, this policy is directly derived from long-term value estimates of state-action pairs. In environments with real-valued state-spaces, however, it is impossible to enumerate the value of every state-action pair, necessitating the use of a function approximator in order to infer state-action values from similar states. Typically, function approximators require many parameters for which suitable values may be diicult to determine a-priori. Traditional systems of this kind are also then bound to the xed limits imposed by the initial parameters, beyond which no further improvements are possible. This paper introduces a new method to adaptively increase the resolution of a discretised action-value function based upon which regions of the state-space are most important for the purposes of choosing an action. The method is motivated by similar work by Moore and Atkeson but improves upon the existing techniques insofar as it: i) is applicable to a wider class of learning tasks, ii) does not require transition or reward models to be constructed and so can also be used with a variety of model-free reinforcement learning algorithms, iii) continues to improve upon policies even after a feasible solution to the learning problem has been found.
منابع مشابه
Elastic Resource Management with Adaptive State Space Partitioning of Markov Decision Processes
Modern large-scale computing deployments consist of complex applications running over machine clusters. An important issue in these is the offering of elasticity, i.e., the dynamic allocation of resources to applications to meet fluctuating workload demands. Threshold based approaches are typically employed, yet they are difficult to configure and optimize. Approaches based on reinforcement lea...
متن کاملVariable Resolution Hierarchical RL
The contribution of this paper is to introduce heuristics, that go beyond safe state abstraction in hierarchical reinforcement learning, to approximate a decomposed value function. Additional improvements in time and space complexity for learning and execution may outweigh achieving less than hierarchically optimal performance and deliver anytime decision making during execution. Heuristics are...
متن کاملMulitagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces
We investigate the learning problem in stochastic games with continuous action spaces. We focus on repeated normal form games, and discuss issues in modelling mixed strategies and adapting learning algorithms in finite-action games to the continuous-action domain. We applied variable resolution techniques to two simple multi-agent reinforcement learning algorithms PHC and MinimaxQ. Preliminary ...
متن کاملA Dynamic Tree Structure for Incremental Reinforcement Learning of Good Behavior
This paper addresses the idea of learning by reinforcement, within the theory of behaviorism. The reason for this choice is its generality and especially that the reinforcement learning paradigm allows systems to be designed, which can improve their behavior beyond that of their teacher. The role of the teacher is to deene the reinforcement function, which acts as a description of the problem t...
متن کامل